Generating 3D architectured nature-inspired materials and granular media using diffusion models based on language cues

Abstract A variety of image generation methods have emerged in recent years, notably DALL-E 2, Imagen and Stable Diffusion. While they have been shown to be capable of producing photorealistic images from text prompts facilitated by generative diffusion models conditioned on language input, their capacity for materials design has not yet been explored. Here, we use a trained Stable Diffusion model and consider it as an experimental system, examining its capacity to generate novel material designs especially in the context of 3D material architectures. We demonstrate that this approach offers a paradigm to generate diverse material patterns and designs, using human-readable language as input, allowing us to explore a vast nature-inspired design portfolio for both novel architectured materials and granular media. We present a series of methods to translate 2D representations into 3D data, including movements through noise spaces via mixtures of text prompts, and image conditioning. We create physical samples using additive manufacturing and assess material properties of materials designed via a coarse-grained particle simulation approach. We present case studies using images as starting point for material generation; exemplified in two applications. First, a design for which we use Haeckel’s classic lithographic print of a diatom, which we amalgamate with a spider web. Second, a design that is based on the image of a flame, amalgamating it with a hybrid of a spider web and wood structures. These design approaches result in complex materials forming solids or granular liquid-like media that can ultimately be tuned to meet target demands.


Introduction
Materials design using a variety of hierarchical architectures, including porous materials, has been a subject of intense research over the past decades [1][2][3][4][5][6][7]. While significant progress has been made, the exploration of novel architectures, and the use of unconventional sources as a body of knowledge to derive hierarchical structural designs, remains a challenge. One source for material design solutions is the use of biologically inspired paradigms, where a growing body of knowledge has contributed to new explorations in materials research [8][9][10][11][12][13][14][15].
Another avenue is the use of cross-cutting intersections of knowledge bases, integrating insights that incorporate a broad range of biological, human, cultural and scientific knowledge [16][17][18]. Such knowledge, collected across civilizations and eras of human development, and captured in the large body of knowledge encapsulated in the nexus of language, symbolism, images and associations between human thinking and physical or conceptual materializations of such, provides an important frontier in computationally driven design.
To solve such problems, the use of deep learning offers avenues toward fundamental solutions to these challenges.
Recently, a variety of image generation methods have been proposed, notably DALL-E 2 [19], Imagen [20] and Stable Diffusion [21]. In this article, we focus on a specific aspect, to explore to what extent these methods can be used for broader sets of Nature-inspired materials design applications [22][23][24][25][26][27], realizing the overall approach summarized in Fig. 1. In earlier research [28][29][30][31], text-to-material translations have already been examined, including using combinations of CLIP with VQGAN [32]. This enabled not only the translation of text to material designs, based on comprehensive training data that represent a broad spectrum of all vision-text pairs created through media collections like the Internet, but also enabled researchers to direct assembly of custom material building blocks into specific shapes and patterns, such as done for the case of flame particles [28].
Here, we build on this work, but use more sophisticated image generation tools, using diffusion architectures [20, 21, 45-50] Figure 1. Schematic overview of the method developed and applied here. We start from text prompts, over which we interpolate in some form, resulting in a stack of images that form a voxel representation of a 3D material. These are then processed (here into black/white form, representing black ¼ no material, white ¼ material) and stacked, forming a voxel representation that is also translated into a 3D mesh. The mesh can be used either for simulation analysis or for additive manufacturing, followed by experimental assessment. (Buehler,unpublished work [51]), and a fundamentally different approach for text generation where synthesis no longer requires an iterative examination of latent space through co-operation with a CLIP classifier, but that facilitates synthesis of images directly from a seed noise vector. This approach provides numerous advantages, including much higher resolutions and image fidelity, which as is shown in this article bodes well for material design applications. Moreover, models like Stable Diffusion or DALL-E 2 [19,21,52] emerge as a representation of a broader collective human corpus of visual-text pairings, which can be a powerful reservoir for materials design applications. In the spirit of what is referred to as bio-inspired design, this approach represents a variation of the concept to translate design ideas across modalities and physical realizations.
The general mathematical context of these models is that they produce an image I from a given text input T (and an optional image conditioning, I input used during image-to-image translation), as well as a set of synthesis parameters p : The text prompt is simply provided as string data and p includes synthesis parameters g (guidance scale in text-to-image generation or strength parameter within 0. . .1 in image-to-image generation guided by text), n (number of inference steps), g (between 0.1, where lower values typically yield better quality, higher values more diverse results), combined with a random seed S (implemented via a PyTorch global seed): p ¼ ðg; n; g; SÞ: ( The parameter g delineates how strongly the model follows the text prompt, as opposed to generating more random solutions. Altogether, the image generation model can be mathematically summarized to yield images I based on its salient input parameters as I ¼ DðT; I input ; g; n; g; SÞ: ( The standard resolution of images generated with Stable Diffusion is 512 Â 512, albeit the method can also generate largerresolution images (unless otherwise indicated, we use the standard resolution in image synthesis; Supplementary Fig. S1 shows examples for images generated at 512 Â 512 and 1024 Â 1024 resolution).
A challenge that needs to be overcome is that materials typically require 3D realizations; quite distinct from 2D image data. While there has been work on 3D model generation [53], sophisticated text-to-image diffusion models cannot yet generate 3D representations. Other methods that have been proposed is to use neural networks to predict 3D data directly from 2D visuals [54]; however, these tools are relatively early in their development stages and not generally applicable. Hence, we set forth a simple but structurally diverse and rich algorithm that enables the direct use of state-of-the-art image generation tools, specifically Stable Diffusion [21] (but the method could generally be applied to other methods as well including for novel neural network architectures that are trained specifically with this downstream application in mind). We further outline a method to rapidly assess properties of the generated design, realized using a coarsegrained particle model that offers a means to assess mechanical, vibrational or other features. While this is beyond the scope of this initial article, future work could use this pipeline as a way to conduct a broad search of the design space defined by the nexus of the lexicon of human language, knowledge and mathematically or statistically parameterized, or learned, latent space.

Generative model: interpolating and mixing text conditioning
The method used here is based on a pre-trained Stable Diffusion model and using the sd-v1-4.ckpt [55] weights. We create a variety of functions that change the way by which images are synthesized, enabling multiple text prompts and an iteration between such text prompts using a weighting function. In the Stable Diffusion model, a text prompt T is translated into an embedding tensor E that captures the coding of the particular input provided in a form that the generative diffusion model understands as a conditioning to produce a particular image that reflects the text.
Building on this approach, in order to use multiple text prompts, we first generate embeddings E 1 and E 2 for two text prompts T 1 and T 2 provided, following generally E i ¼ f text emb ðT i Þ. The two embedddings are then mixed according to a weight k (between 0 . . . 1): This then results in an expanded generator model that features multiple text prompts and the weight parameter k: I ¼ DðT 1 ; T 2 ; I input ; g; n; g; S; kÞ: Before translation to 3D representation, the images I generated by the diffusion model are processed. In this study, we focus on processing to convert shades of color or intensity into a binary representation, primarily because we aim to design and manufacture materials with two material types only: void, and material present, at each point in 3D.
First, we translate the image into a binary B&W representation using cv2.threshold. We resize the image to the desired output resolution, and apply cv2.bilateralFilter and cv2.GaussianBlur, followed by a second cv2.threshold operation. This helps to generate smooth contours, of white (material) and black (no material) distributions. Next, we remove small white areas from each image to avoid too many small clusters that can be difficult to manufacture. We achieve this by finding all contours using cv2.findContour, and then remove areas below a certain threshold by drawing over with the black signal. The approach could easily be modified to use other transformations, for example, realizing multiple-material outputs depending on color or texture produced by the generative model.
Within the set of parameters p, we typically use g ¼ 7:5 and n ¼ 20. Random seeds are utilized to generate images from noise vectors and controlling the seeds enables us to reproduce results deterministically.

Generating a voxel representation in 3D and translation to a mesh model
In order to generate 3D architectures, we introduce a voxel representation. As a basic step, each image is translated into one sheet of voxel as described above. We either use the diffusion model to generate a series of images and from them, voxel sheets, that are stacked together to form a 3D representation of material, or use a small set of voxel sheets and generate interpolations between them. In the interpolation case, we use scipy.interpn to interpolate between two contours, resulting in a smooth transition between the top and bottom voxel sheets (for a schematic of how this process works, see example in Fig. 2a, left).
For applications in analysis (coarse-grained modeling) or additive manufacturing, the voxel data are translated into a 3D mesh representation using trimesh. For this step, the voxel stacks are processed using the trimesh.voxel.ops.matrix_to_marching_ cubes function. As an option, the resulting meshes are smoothed using trimesh.smoothing.filter_mut_dif_laplacian or used directly, as predicted.

Coarse-grained model for mechanical assessment
To illustrate the potential to examine physical properties of the generated structures, we implement a coarse-grained LAMMPS model [56]. We consider the mesh file (e.g. loaded as STL file) and insert a regular face-centered cubic particle structure in the inside of the mesh (alternatively, we can work directly with the voxel data but using mesh data as input enables users to potentially process the mesh files in other code or combine multiple mesh files into larger assemblies). Each particle interacts according to a harmonic inter-particle energy potential, defined as  We choose k ¼ 1:5 and r 0 ¼ 1:0. Only nearest neighbor particles interact. Bonds can never break during simulations, as defined by the interparticle energy model in Equation (2).
We implement displacement boundary conditions, where the top and bottom row of the particle system are fixed and move according to a prescribed pulling rate to implement mechanical deformation.
The resulting data are analyzed using Python scripts and visualized using OVITO [57]. We use mesh representations of the particle model to visualize the 3D structures, with color codes Figure 4. Generation of architectured materials from diffusion models. We use two text prompts T 1 ¼ 'several small white circles on black background' and T 2 ¼ 'a large triangle in the shape of a spider web on black background' and interpolate in three steps between them. Parameters used are p ¼ ðg ¼ 0:8; n ¼ 20; S ¼ 33Þ. (a) Depiction of the three designs generated; however, only the left and center design are used for interpolation. It is seen that the mixing of the two prompts at k ¼ 0.5 (the center result) yields an interesting design that neither text prompt alone could have generated. to indicate the stress level (blue ¼ low stress and red ¼ high stress).

Additive manufacturing
We use a Ultimaker S3 fused deposition modeling (FDM) 3D printer with white polylactic acid (PLA) filament to produce the physical samples. The granular media are produced using a QIDI PRO 3D printer with wood-based PLA filament, hence the woodlike color visible in the photographs. Supplementary Movie M6 shows a video of the additive manufacturing process for this and some of other samples reported in this article.

Results
The purpose of this article is to report the general methodology and to implement a first demonstration of the proposed concept. We cover both, generation of 3D architectures from various text prompts, as well as generating a new form of granular media by generating a large ensemble of text generated particles.
First, we demonstrate the use of the mixed text embeddings in generating continuously varying images, as shown in Fig. 3. We use T 1 ¼ 'a fracture in glass with sharp edges'  This example already shows an interesting transition between the two prompts; in early stages, the generated image reflects that of prompt T 1 whereas at the end, it approaches the solution of T 2 . The transition between the two is not 'linear' but rather induces more sudden transitions in the type of generated images. We use this example as motivation to generate complex, more abstract designs by mixing individual prompts. Figure 3 depicts results for two distinct prompts as input, and smoothly interpolating between them to generate a larger voxel representation of a 3D engineering design. Going into details of the process, Fig. 3a shows the two source images generated from the prompts T 1 ¼ 'a small white circle on black background' and T 2 ¼ 'a large white hexagon with sharp edges on black background'. Figure 3b shows the resulting 3D geometry viewed from different angles. Finally, Fig. 3c shows physical samples manufactured using 3D printing.
Next, we explore the potential to use a mixing of text prompts as a way to yield interesting designs, using k. Figure 4 depicts the results of generation of architectured materials [58] through interpolation, by varying the k parameter in Equation (5). We use two text prompts T 1 ¼'several small white circles on black background', and T 2 ¼ 'a large triangle in the shape of a spider web on black background' and interpolate in three steps between them. Figure 4a shows a depiction of the three designs generated. We find that the left and center design-where the center design reflects a particularly interesting outcome, achieved by mixing the two prompts at k ¼ 0.5. Figure 4b shows a visualization of the resulting 3D geometry from two angles, from top/bottom to show the distinct features at either end. To illustrate the process by which the design is manufactured, Fig. 4c shows details of the slicing in the 3D printing process (we use an internal gyroid structure). Figure 4d then depicts the final 3D printed sample of the material.
The image generation model with its various parameters delineated in Equation (5) can be used to define a set of building blocks that can be arbitrarily combined to yield interesting combinatorial options. Figure 5 realizes this idea and presents designs from simple original input, assembled in different organizations. The original generation results in four distinct designs I 3 , I 1 , I 3 and I 4 (since I 3 % I 4 , we only use the first three in the design process). As input we use two simple text prompts T 1 ¼ 'a white circle on black background' and T 2 ¼ 'one large white square centered on black background' and interpolate in four steps between them. Figure 5a shows the original Image results I i before processing. Figure 5b then shows various permutations of the elemental designs and resulting 3D structures. Figure 5c displays photographs of 3D printed samples, for two of the designs.
The method cannot just be used to make singular architectural materials. Using additive manufacturing, we can easily generate hundreds of copies of particles, each of which is designed using the diffusion approach described here. Figure 6 demonstrates this granular media generation approach, for different design cues. In the design used in Fig. 6a, we trigger synthesis using text prompts T 1 ¼ 'a white circle on black background' and T 2 ¼ 'a white oval on black background'. As can be seen, this generation task results in a single particle. The singular particle is then Figure 7. Mechanical assessment of one of the designs shown in Fig. 5. We limit the exploration to a simple tensile test (a), resulting in stress-strain curves (b), a depiction of the Von Mises stress (c), and a stress field in the 3D domain (d) and a cross-section (e). All stresses and displacements are plotted in non-dimensional units, normalized by the largest stress/displacement in the numerical experiment.
replicated a larger number of times and then 3D printed, forming the granular material shown on the right. Figure 6b shows how many variations of a design can be generated using a single text prompt, T ¼ 'white bright stars in the shape of a spiral on black background'. Since this prompt results in many particles at the same time, we can directly produce a granular material from the resulting image. We generate two sets of these star-shaped granules with different height using a simple extrusion approach. Figure 6c shows the result of a simple shaking experiment conducted to demonstrate the liquid-like mechanical behavior of the resulting material. Supplementary Movies M2-M4 show additional shaking experiments, recorded in slow-motion, to demonstrate the characteristic of the produced material as a fluid-like substance.
While we limit ourselves largely to exploring the generation process and how various parameters affect the result, future work should focus on characterizing the designs to meet certain objective demands. For this scenario, either experimental or computational assessment methods are needed. Figure 7 shows a simple method to offer a rigorous mechanical assessment of one of the designs shown in Fig. 5. We limit the exploration to a simple tensile test (Fig. 7a), resulting in stress-strain curves (Fig. 7b), a depiction of the Von Mises stress (Fig. 7c), and a stress field in the 3D domain (Fig. 7d) and a visualization of the internal stresses via a cross-sectional view (Fig. 7e). The results depicted here are based on a coarse-grained model that captures, via a shape-based mesoscale model, elementary structure-function relationship. Developed directly based on the geometry file produced by the generative method (from an STL file), the simulation approach allows us to explore various mechanical boundary conditions, including a tensile test as done here. The data produced by such a method can easily be integrated with a Bayesian optimization algorithm, and parameters in Equation (5) could be tuned to meet certain design demands.
We now show a couple of systematic explorations of key parameters in Equation (5) and examine how they affect the produced images, and by extension, how they affect the design cues we can utilize for 3D material construction. Figure 8 displays the results of systematic variations of inference steps n and guidance scale g. Similarly, Fig. 9 shows results of a systematic variation of inference steps n and guidance scale g, but this time for two text prompts T 1 ¼ 'several small white circles on black background' and T 2 ¼ 'a large triangle in the shape of a spider web on black background' mixed with k ¼ 0:5.
Thus far, all image generation tasks were conducted solely based on text prompts (either a single one or mixed prompts to enhance design diversity). Now, we use also an input image to condition the generation, in addition to one or more text prompts. As will be shown, this offers a tremendous range of controllability and expressiveness in terms of inducing highly complex design concepts that cross or amalgamate cues provided. Figure 10 shows the results of these computational experiments, depicting generation of a variety of images based on a starting image as an additional input. In this case, we use a diatom structure as input image (Fig. 10a) for variations of parameters inference strength g and n.
The diatom image is extracted from Haeckel's lithographic print of Figure 9. Systematic variations of parameters inference steps n and guidance scale g. Text prompts are T 1 ¼ 'several small white circles on black background' and T 2 ¼ 'a large triangle in the shape of a spider web on black background'. Other constant parameters p ¼ ðS ¼ 33; k ¼ 0:5Þ. a diatom from Kunstformen der Natur (English: Art Forms in Nature), as reported in Ref. [59]. The text prompt is T¼ 'a spider web with thick white lines on black background'. Results are shown in Fig. 10b. The palette of designs can then serve as a starting point for further exploration or can be combined into a set of material building blocks akin to what was presented in Fig. 5. By playing with the text prompt, one can achieve very distinct material design outputs. Figure 11 shows a variation of g and strength g (all while using the same image prompt as input as shown in Fig. 10a). The text prompt used here is T ¼ 'several small white circles on black background'. Another example for different parameter variations is shown in Supplementary Fig. S2. These broad variations in design can be turned into 3D architecture materials, as shown in Fig. 12 for a few examples.
By controlling the input image, we can expand the space of resulting nature-inspired designs. For instance, Supplementary Fig.  S3 shows a candle-based design, where text prompts are T 1 ¼ 'a spider web with thick lines on black background' and T 2 ¼ 'the internal details of wood microstructure', with k ¼ 0:25: Figure 13 presents the entire workflow from design to modeling to manufacturing, for a design based on the intersection of a flame image with such a complex text prompt, for one of the designs generated (see red mark in Supplementary Fig. S3 for the one picked). Supplementary Movie M5 shows a movie of the tensile deformation simulation of the material. The results reveal that interesting material designs can be generated from a rich repertoire of design ideas, accessed directly via a combination of human language input, nature-based design ideas and mathematical Figure 11. Variation of g and strength g. The text prompt is T 1 ¼ 'several small white circles on black background'. Other constant parameters p ¼ ðS ¼ 33Þ.
parameterization. The resulting architecture material combines features from all of these foundational cues and amalgamates them into an intricate structural design. The mechanical assessment, following a similar approach as done earlier for the results in Fig. 7, provides us with a quantitative mechanism to better understand the design, score key performance measures and optimize the resulting designs to meet a set of demands.

Discussion and conclusion
In this article, we used a pre-trained Stable Diffusion model and consider it as an experimental system-reflecting a broad corpus of human knowledge-to examine its capacity to generate novel material designs specifically in the context of 3D material architectures. Such materials design may find many applications ranging from optical to mechanical [60,61] or multifunctional and integrated responsive material systems [62,63].
We demonstrated that this approach offers a useful paradigm to generate a variety of novel material designs, using human language as a reservoir for cultural and civilization-spanning knowledge as design input, and exploring a vast nature-inspired portfolio of architectures and patterns. We present a series of methods to translate 2D representations into 3D data, including movements through noise spaces, mixtures of text prompts and interpolations. We manufactured several samples using additive manufacturing [15,64,65], and presented a method to assess the mechanical features (including stability) and other structural material properties of materials designed in that way, based on a coarse-grained shape-based particle simulation.
Specific objective functions that score material designs for alternative target properties, beyond mechanical stability, could be developed, based on existing literature. For instance, optimization work to design photonic crystal structures has been reported [66,67], which could be enriched with the tools described here. Within this context, materials that meet multiple design demands could be constructed, such as waveguide filters.
A challenge in the use of the pre-trained model as conducted here is that some designs may not yield continuous solutions in 3D, which may affect mechanical stability or manufacturability.
In the examples reported here, we focused specially on relatively simple cases where we achieve a continuous material design; however, variations of such scenarios can easily be constructed For these examples, we use the pixel color intensity to map to depth of the resulting 3D structure; bright/white ¼ maximum height, dark/black ¼ no material. This method is applied symmetrically in both out-of-plane directions (forward and backward) to yield a 3D material architecture. Figure 13. Entire workflow from design to modeling to manufacturing, for a design based on the intersection of a flame image with a complex text prompt featuring are T 1 ¼ 'a spider web with thick lines on black background' and T 2 ¼ 'the internal details of wood microstructure'. Other constant parameters p ¼ ðS ¼ 33; k ¼ 0:25Þ. (a) Overview of the design steps from raw image to symmetrically extruded 3D material with a box added at the exterior as shown in (b) (we follow the same process of symmetrically extruding the image based on pixel intensity as explained in Fig. 12). Panel (c) shows a simple mechanical assessment under tensile deformation and (d) shows the resulting stress-strain results and stress field statistics for the Von Mises stress. (e) Photographs of the final manufactured material using FDM 3D printing. Supplementary Movie M4 depicts the stress field as the sample is deformed. Supplementary Movie M6 shows a recording of the additive manufacturing process for this and some of other samples reported in this article. that fail to produce proper mechanical designs. The mechanical analysis, as depicted in Figs 7 and 13, is critical to provide a rigorous assessment of stability. In an unsupervised algorithm that explores vast spaces of designs, a rapid assessment of the mechanical properties could help to identify solutions that meet certain design demands, including mechanical stability.
The pre-trained model offers a very vast space of design solutions, far exceeding the samples considered here. Especially by combining the diffusion model with initial image cues (see Figs 10-12) provides a structurally diverse and rich platform to work from. If the existing platform is not sufficient, or if specific target designs are desirable, the model can easily be fine-tuned or retrained based on new or expanded data. This can offer not only a mechanism to use 2D image data as reported in this article but could possibly also offer a pathway to directly construct 3D data from text cues, as long as datasets exist that map conditioning constraints with resulting structural designs. This could also address inherent biases included in datasets that yield models such as Stable Diffusion, and any limitations that stem from sourcing of the data from particular sources (e.g. Internet-based text-image pairings versus more broadly researched culturally richer relationships that exist beyond the Internet). Generally, though, the framework used here can capture such richer, more diverse and larger datasets and provide ample room for improvements in a variety of forms.
The use of deep learning, and especially generative methods, opens important frontiers in materials design. The use of conditional diffusion models as used here can be expanded, or altered, to reflect materials-specific training sets. Thereby, future applications of the technology presented here can focus on models trained specifically to capture hierarchical materials, or a specific subset of bio-based material designs.
The general concepts introduced here offer many opportunities for future work, such as targeted optimization for specific material properties including mechanical deformation and fracture [68][69][70]. It will also be interested to further examine the image generation method and explore, systematically, the effect of variations in text prompts on the final design. Another target of study could be the exploration of variability, especially in light of recent findings that natural variability in designs seen in Nature often yield superior material performance [71][72][73]. Using the source of such natural variability, as predicted in the designs using Stable Diffusion, could be an interesting topic of future research.

Data availability
Data are incorporated into the article and its Online Supplementary Material. Other data are available on request.

Funding
This work was supported by the Army Research Office (W911NF1920098), National Institutes of Health (U01EB014976 and 1R01AR077793) and Office of Naval Research (N00014-19-1-2375 and N00014-20-1-2189). Additional support from MIT-IBM Watson AI Lab is acknowledged.